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Middle East respiratory syndrome coronavirus (MERS-CoV) originated in bats and spread to 
humans via zoonotic transmission from camels. We analyzed the evolution of the spike (S) gene in 

: betacoronaviruses (betaCoVs) isolated from different mammals, in bat coronavirus populations, as 

: well as in MERS-CoV strains from the current outbreak. Results indicated several positively selected 
sites located in the region comprising the two heptad repeats (HR1 and HR2) and their linker. Two 
sites (R652 and V1060) were positively selected in the betaCoVs phylogeny and correspond to 
mutations associated with expanded host range in other coronaviruses. During the most recent 
evolution of MERS-CoV, adaptive mutations in the HR1 (Q/R/H1020) arose in camels or in a previous 
host and spread to humans. We determined that different residues at position 1020 establish distinct 

: inter- and intra-helical interactions and affect the stability of the six-helix bundle formed by the HRs. 

: A similar effect on stability was observed for a nearby mutation (T1015N) that increases MERS-CoV 

: infection efficiency in vitro. Data herein indicate that the heptad repeat region was a major target of 
adaptive evolution in MERS-CoV-related viruses; these results are relevant for the design of fusion 
inhibitor peptides with antiviral function. 


: Middle East respiratory syndrome coronavirus (MERS-CoV), a newly emerged virus that can cause severe 
: lower respiratory tract infection in humans, was first identified in Saudi Arabia in 2012!. Since then, 945 
: laboratory-confirmed cases of MERS-CoV infection, leading to at least 348 related deaths, have been reported 
: to the WHO (as of January 5%, 2015) (http://www.who.int/csr/don/05-january-2015-mers-jordan/en/). 
MERS-CoV belongs to the clade c of betacoronaviruses (betaCoVs)*, which also includes two bat 
sister species, namely Ty-BatCoV HKU4 and Pi-BatCoV HKUS5, isolated from the lesser bamboo bats 
(Tylonycteris pachypus) and Japanese pipistrelles (Pipistrellus abramus), respectively’. Additional viruses 
related to MERS-CoV have been described in bats (BtCoV/KW2E-F93, BtCoV/133) and hedgehogs 
(Erinaceus coronavirus, EriCoV)*”. Recently, a virus belonging to the same species as MERS-CoV was 
: isolated in Neoromicia bats (NeoCoV), supporting a bat-origin for MERS-CoV°*. This hypothesis received 
: further confirmation by the identification of dipeptidyl-peptidase 4 (DPP4, also known as CD26) as the 
: cellular receptor used by both Ty-BatCoV HKU4 and MERS-CoV7”. 
Notably, high titers of MERS-CoV neutralizing antibodies have been detected in camels from various 
countries and MERS-CoV has been isolated from these animals in Saudi Arabia and Qatar!?. Genetic 
diversity is slightly higher for camel-derived viruses compared to haman MERS-CoV isolates, suggesting 
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camel to human transmission rather than vice versa!?. Therefore, the most likely scenario envisages that 
a bat-derived MERS-CoV spread to humans via the zoonotic transmission from dromedary camels. 

Coronaviruses use their spike (S) protein to bind a host receptor and to promote membrane fusion. 
The spike protein assembles as a trimer on the viral surface and belongs to the class I fusion protein 
family''. Class I fusion proteins are found in many other virus genera including retroviruses, orthomyx- 
oviruses, paramyxoviruses, and filoviruses and share similar domain organization, as well as common 
functional properties’?. 

Host proteases cleave the CoV spike protein into two functionally distinct domains: the N-terminal 
region (usually referred to as the S1 subunit) contains the receptor binding domain (RBD), whereas the 
C-terminal portion (S2 subunit) includes the fusion peptide, two heptad repeats (HR1 and HR2), and the 
transmembrane (TM) domain” (see Fig. 1A). Following receptor binding, membrane fusion is mediated 
by a major conformational rearrangement that exposes the fusion peptide and results in the formation 
of a six-helix bundle (6HB)!*’. The core of the 6HB is a triple-stranded coiled coil formed by the HR1s 
of the three spike subunits forming the trimer; the HR2 elements pack within the grooves of the coiled 
coil in an antiparallel direction'>. 

Because of its central role in membrane fusion, a number of antiviral peptides that interfere with the 
6HB formation have been developed as potential therapeutic compounds against HIV’, Ebola virus’®, 
SARS-CoV!”!8, and MERS-CoV"™. 

Although the RBD of spike proteins is generally considered the major determinant of host range, sev- 
eral reports have suggested that variation in the C-terminal portion of spike proteins, particularly in the 
HRI and HR2, determine host range expansion’. Moreover, recent works indicated that MERS-CoV and 
Ty-BatCoV HKU4 bind DPP4 both of human and of bat origin’. In particular, although MERS-CoV 
binds human DPP4 with higher afhnity than Ty-BatCoV HKU4, which shows a preference for the bat 
receptor, the RBDs of the two viruses engage human DPP4 via a similar binding mode’. These obser- 
vations suggest that MERS-CoV and related viruses have the potential to shift host range with little 
adaptation of the RBD. 

Motivated by the notion that evolutionary analyses can provide information on the molecular events 
that underlie host shifts and, more generally, host-pathogen interactions’, we investigated the evolu- 
tionary history of S proteins in MERS-CoV and related betaCoVs. Specifically, we aimed to determine 
whether natural selection drove the evolution of specific regions and sites that may contribute to var- 
iation in host range or replication efficiency. Thus, using different strategies, we analyzed MERS-CoV 
strains isolated from human and camels, as well as MERS-CoV-related viruses from other mammals. 
Data indicate the HR1 to HR2 region as a major target of adaptive evolution in these viruses. 


Results 

Positive selection shaped the evolution of clade c betaCoV spike protein. We first investi- 
gated whether positive selection drove the evolution of MERS-CoV-related coronavirus spike proteins. 
Previous phylogenetic analyses of S genes of viruses isolated from humans/camels (MERS-CoV), bats, 
and hedgehogs (Supplementary Table. S1) indicated that the S1 and S2 regions display different tree 
topologies (Fig. 1B), possibly as a result of recombination®. Because recombination can inflate estimates 
of positive selection”, we separately analyzed the two regions. 

We pruned the S1 and S2 multiple sequence alignments (MSAs) from unreliably aligned codons and 
we screened them for evidence of additional recombination events using GARD (Genetic Algorithm 
Recombination Detection)*', which detected no breakpoint. 

The saturation of substitution rates represents a major problem in the detection of positive selection 
among distantly related sequences. Computation of the nonsynonymous (dN) and synonymous (dS) 
substitution rates over whole phylogenies allows breaking of long branches, resulting in improved rate 
estimation. Thus, evidence of episodic positive selection was searched for by using the codeml branch-site 
test”, which is relatively insensitive to the saturation of substitution rates”. In the S1 and S2 regions, 
3 and 4 branches yielded statistically significant evidence of positive selection under different codon 
frequency models (Fig. 1B, Table 1 and Supplementary Table S3). Positively selected sites along these 
branches were detected using the BEB (Bayes empirical Bayes) procedure and validated using the Mixed 
Effects Model of Evolution (MEME)"*. 

One positively selected site was found in the S1 region, 7 sites in the S2 subunit (Fig. 1A, Table 1). 
The R652 selected site (in S1) almost corresponds to two mutations that independently arose in the 
SARS-CoV spike gene as a result of in vitro adaptation of zoonotic strains to primate cells” (Fig. 1A). In 
S2, most selected sites are located in the HR1, HR2, and in the intervening linker. Remarkably, position 
1060 is the almost exact counterpart of aminoacid changes that expand the host range or cell-type tropism 
of infectious bronchiolitis virus (IBV, L857F) and murine hepatitis virus (MHV, E1035D) (Fig. 1A)*°?’. 

No positively selected site was found to be located in the RBD (Fig. 1A). Nevertheless, the pruning 
of unreliably aligned codons operated on the MSA left a minority of RBD sites available for analysis. We 
thus repeated the branch-site test on a subset of more closely related sequences (Fig. 1B). This procedure 
decreased divergence and pruning in the RBD and allowed analysis of most codons; even with this pro- 
cedure, no positively selected site was detected (not shown). 
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Figure 1. Positive selection at the spike gene of MERS-CoV-related coronaviruses. (A) Cartoon 
representation of the MERS-CoV spike protein with distinct domains in different colors (SP, signal peptide; 
NTD, N-terminal domain; RBD, receptor binding domain; FP, fusion peptide, HR1 and HR2, heptad repeat 
1 and 2; TM, transmembrane domain; CP, cytoplasmic domain). The location of positively selected sites 
detected in MERS-CoV related sequences is shown in red. A positively selected residue in MERS-CoV 
isolated from humans and camels is in orange. The positively selected site and recombination breakpoints 
in Pi-BatCoV HKU5 sequences are shown in blue and black, respectively. Furin cleavage sites are depicted 
as triangles°®. Two alignment portions are shown; positions that alter virus host range or tropism in SARS- 
CoV, MHV, and Beaudette strain IBV (IBV Beau CK) are highlighted in green”. A functional mutation 
which arose during tissue-culture adaptation of MERS-CoV (strain EMC2012) is highlighted in grey”. 

(B) Bayesian phylogenies for the S1 (left) and S2 (right) sequences®; branch length is proportional to dS. 
Branches in red were set as foreground lineages in independent branch-site tests. Thick branches yielded 
statistically significant evidence of positive selection. The bracket denotes a subset of sequences that were 
used for analysis of positive selection in the RBD. (C) OmegaMap results for Pi-BatCoV HKU5 spike genes. 
The hatched red line corresponds to a posterior probability of selection equal to 0.95. 
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Foreground branch Sites* identified by BEB 
Spike region (MA vs MA1)! —2AlInL? | p value (FDR corrected p value)? and MEME 
S1 subunit 
Node 18 30.92 2.69 x 10-8 (1.35 x 1077) R652 
Node 19 16.18 5.74 x 10™ (1.43 x 1074) — 
Node 20 1.51 0.218 (0.272) — 
Node 24 14.86 1.15 x 10-4 (1.92 x 1074) = 
Node 25 0.71 0.399 (0.399) — 
S2 subunit 
Node 15 14.45 1.44 x 1074 (7.20 x 107$) K854, 11180 
Node 16 12.86 3.37 x 10~*4 (8.43 x 1074) V1060 
Node 17 5.66 0.0173 (0.0216) M939, S1114, S1148 
Node 18 0 1 (1) 
Node 23 7.20 7.30 x 10° (0.0121) A1275 

















Table 1. Likelihood ratio test statistics for branch-site tests (clade c betaCoV). ‘MA and MA1 are 
branch-site models: MA allows a proportion of codons with dN/dS > 1 on the foreground branches, whereas 
the MA1 model does not. The F61 codon frequency model was used. °2AlnL is twice the difference of the 
natural logs of the maximum likelihood of the models being compared. *Degrees of freedom = 1. “Positions 
are relative to the MERS-CoV sequence (EMC/2012). 
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Minor effect of positive selection for the S genes of Ty-BatCoV HKU, and Pi-BatCoV 
HKU5. Previous analysis of Ty-BatCoV HKU4 and Pi-BatCoV HKU5 viruses isolated in Hong Kong 
indicated that positive selection targeted the spike protein and particularly its S1 region”. Nevertheless, 
recombination was not accounted for in that analysis. We thus analyzed the sequence alignments of the 
Hong Kong isolates for the presence of recombination breakpoints using GARD. The algorithm detected 
9 recombination breakpoints for the Pi-BatCoV HKU5 alignment (Fig. 1A) and none for Ty-BatCoV 
HKU4. The Ty-BatCoV HKU4 spike gene was therefore analyzed using the codeml site models, which 
test the hypothesis that a subset of codons evolve with dN/dS > 1. No evidence of positive selection was 
found, even using the relatively non-conservative model M7/model M8 comparison (Supplementary 
Table S4). As for the Pi-BatCoV HKU5 spike gene, rampant recombination prevents application of a 
similar approach. We thus resorted to the simultaneous estimation of selection and recombination using 
omegaMap”’. This analysis confirmed high recombination along the whole gene (Supplementary Fig. S1) 
and detected a single positively selected site in the S1 region (Fig. 1C). The same site (codon 198, position 
relative to the MERS-CoV spike sequence) was also detected by MEME, which was run by incorporating 
the alternative phylogenies detected by GARD. Most of the previously reported selected sites’ were not 
detected using other methods that account for recombination (Supplementary Table S5). We thus con- 
sider that robust inference of positive selection in Pi-BatCoV HKU5 spike genes can only be made for 
position 198, which lies outside the RBD (Fig. 1A). 


Positive selection in the MERS-CoV heptad repeat 1. We next wished to determine whether 
positive selection occurred at the spike gene of MERS-CoV viruses circulating in the recent outbreak. A 
previous study suggested that positive selection drove the evolution of two codons in the MERS-CoV 
spike gene (positions 509 and 1020)”. Nevertheless, in that study only one method was used to infer 
selection and sequences isolated from camels were not included. 

We thus retrieved 54 fully or almost fully spike sequences of MERS-CoV isolated from camels or 
humans (Supplementary Table S1). Alignments for the S1 and S2 regions were separately analyzed and 
screened for the presence of recombination. No breakpoint was detected and the codeml site models 
were applied. For the S2 region, two models of gene evolution that allow a class of codons to evolve 
with dN/dS > 1 (NSsite models M2a and M8) showed better fit to the data than the null models (NSsite 
models Mla and M7), strongly supporting the action of positive selection (Table 2, Supplementary Table 
S6). No evidence of selection was detected for the S1 portion (Table 2, Supplementary Table S6). In 
S2, both BEB and MEME detected one selected site: position 1020 in the HRI (Fig. 1A). Interestingly, 
three different residues are observed at this site in both camel- and human-derived viruses (Fig. 1A). 
MERS-CoV is thought to have spread from camels to humans; the presence of the three alternative res- 
idues in viruses isolated from camels suggests that adaptive evolution at this site occurred prior to the 
infection of humans. 

Interestingly, the 1020 variant is in proximity to a mutation (T1015N) (Figs 1A and 2A) that arose 
during tissue-culture adaptation of MERS-CoV (strain EMC2012) and increases replication efficiency”. 
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Positively 
selected 
Codon Degrees % of sites sites* 
Spike Frequency of (average | (BEB and 
region LRT model model freedom | —2AInL? | p value dN/dS) MEME) 
S1 subunit 
Mla vs M2a! F61 2 0.84 0.657 — 
M7 vs M8? F61 2 4.21 0.121 — 
S2 subunit 
Mla vs M2a! F61 2 7.38 0.0250 | 0.2 (15.83) 1020Q 
M7 vs M8? F61 2 9.41 0.0091 0.2 (16.32) 1020Q 





























Table 2. Likelihood ratio test statistics for models of variable selective pressure among sites in MERS- 
CoV isolates. 'M1a is a nearly neutral model that assumes one dN/dS (w) class between 0 and 1, and one 
class with w = 1; M2a (positive selection model) is the same as Mla plus an extra class of w > 1. °M7 is a 
null model that assumes that 0< w< 1 is beta distributed among sites; M8 (positive selection model) is 

the same as M7 but also includes an extra category of sites with w > 1. °2AlnL: twice the difference of the 
natural logs of the maximum likelihood of the models being compared. “Positions are relative to the MERS- 
CoV sequence (EMC/2012). 


Analysis of MERS-CoV HR1 variation. The structure of the 6HB of MERS-CoV has been solved'*"*; 
through its side chain, the Q1020 residue forms hydrogen bonds with D1024 and interacts with M1266 
in HR2 (Fig. 2A,B). Replacement of the glutamine residue with histidine or arginine (observed in the 
camel- and human-derived viruses) results in loss of side chain interactions with M1266 and variably 
affects hydrogen bonds with D1024 (Fig. 2B). 

To gain further insight into the effect of adaptive evolution at position 1020, we performed a sta- 
bility computational analysis after in silico mutagenesis. Q1020 was replaced with all other possible 
aminoacids: even if changes of different magnitude in AG were obtained using three different meth- 
ods*’-**, trends were very consistent (Fig. 2C). In particular, replacement with histidine or arginine res- 
idues resulted in mild destabilization (Fig. 2C). As a comparison, the same analysis was performed for 
position 1015. Replacement of the threonine residue with asparagine, which was previously associated 
with increased replication efficiency in vitro*', resulted in a similar level of destabilization as observed 
for Q1020H/R (Fig. 2C). 


Discussion 

We analyzed the evolution of the S protein in betaCoVs, in bat Ty-BatCoV HKU4 and Pi-BatCoV HKU5 
viral populations, as well as in MERS-CoV isolates from the current outbreak. Different strategies were 
applied, as appropriate depending on divergence and recombination. 

Results indicated that several adaptive changes are located in the S2 region, with fewer in the S1 
domain and none of these within the RBD. It should be noted, though, that in all analyses we applied 
quite conservative approaches and we intersected two different methods to declare a site as positively 
selected. Whereas this approach was meant to limit the false positive rate it may have yielded some false 
negative results. In particular, the branch-site test we used to analyze the S1 and S2 regions of betaCoVs 
is robust to saturation issues and has a minimal false positive rate, but lacks power”. Moreover, due to 
the high divergence and the consequent need of alignment pruning, analysis of the RBD was performed 
on a shallower phylogeny compared to the other regions. This procedure is expected to reduce power, 
but is nonetheless necessary. In fact, alignment errors, together with unrecognized recombination, inflate 
estimates of positive selection and represent major sources of false positive results in evolutionary anal- 
yses**?, Consistently, when we accounted for recombination in Pi-BatCoV HKU5 sequences most pre- 
viously described selection signals disappeared, including those in the RBD”. Thus, whereas we cannot 
exclude that adaptive variants in the RBD of betaCoVs were missed by our approach, we conclude that 
the more recent evolution of MERS-CoV, Ty-BatCoV HKU4, and Pi-BatCoV HKU5 was not driven by 
positive selection in this domain. Conversely, as previously shown for SARS-CoV”, our data support a 
role for the S1 region separating the RBD and fusion peptide as a determinant of betaCoV host range 
expansion. 

Analysis of both betaCoVs and MERS-CoV strains revealed evidence of positive selection in the 
S2 region. Most positively selected sites were found to be located either in the heptad repeats or in the 
intervening linker. Among these sites, position 1060 is particularly interesting, as it almost corresponds 
to substitutions that modify the host range and/or cell tropism in MHV and IBV”°’’. Both viruses belong 
to the coronavirus genus. The Beaudette strain of IBV has been adapted to embryonated chicken eggs; 
following passages in culture, the strain was further adapted to infect Vero cells (from African Green 
monkey) and primary chicken kidney cells*®. The L857F mutation was shown to represent a major 
determinant of the fusogenic activity in these cell types*. As far as MHV is concerned, the E1035D 
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Figure 2. Analysis of variation in the MERS-CoV HRI. (A) Ribbon representation of MERS-CoV 

HR region. Positively selected sites in betaCoVs are shown in red; Q1020 (orange) and M1266 (yellow) 

are shown as sticks. T1015 is in blue. (B) Detail of inter- and intra-helical interactions for residue 1020. 
Interactions are shown for Q1020 (left), R1020 (middle) and H1020 (right). Hydrogen bonds are shown 

as hatched yellow lines; color codes: carbon, white; oxygen, red; nitrogen, blue; sulphur, yellow. Hydrogens 
have been omitted for clarity. (C) Stability analysis for HR1 positions 1020 (left) and 1015 (right). AAG in 
kcal/mol for mutations of Q1020 and T1015 to all other 19 aminoacids are reported. Aminoacid residues 
observed in MERS-CoV sequences are circled. Results are shown for FoldX, PopMuSiC, and I-Mutant. 


mutation was recovered after passages in mouse liver and was shown to contribute significantly to the 
hepatotropism and hepatic virulence of a previously attenuated strain (MHV-A59)’’. Additional variants 
in the HRI and fusion peptide of MHV strains cooperate with changes in the S1 region, resulting in a 
broadening of receptor usage (to heparan sulfate) and, consequently, an extension of the host range”®. 
Finally, in the MHV-A51 strain the ability to bind human CAECAM receptors is strongly influenced by 
mutant residues located in the fusion peptide, HR1 and HR1/HR2 linker”. Similar observations have 
been reported for viruses that do not belong to the coronavirus genus, but that use a class I fusion protein. 
For instance, one single mutation in the HR1 of a simian-human immunodeficiency virus (SHIV) strain 
(KB9) increases by two- to three-fold infection efficiency in cells expressing the marmoset cellular recep- 
tors’, whereas a nearby HRI change in SIV contributes to macrophage tropism*’. Overall, these findings 
pinpoint the relevance of changes in the HR1 and HR2 as modifiers of host range and cell-type tropism. 

The molecular mechanisms underlying the altered phenotype of HR1 and HR2 mutants remain to 
be determined in all these instances, although changes in conformational structure have been suggested 
as a possible explanation. Unfortunately, no coronavirus S protein fusion intermediate or pre-fusion 
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state has been solved to date, hampering investigation of molecular interactions. We therefore analyzed 
the effect of variation at the 1020 position of MERS-CoV on the stability of the 6HB in the post-fusion 
conformation". The presence of a Q1020 had previously been suggested to confer higher stability to the 
MERS-CoV 6HB compared to SARS-CoV™. Indeed, replacement of this residue results in loss of intra- 
and inter-helical interactions. In line with these observations, three different methods used for stability 
analysis were concordant in showing that the alternative arginine and histidine residues at position 1020 
result in a moderate and similar level of destabilization. Although the observed AAG is relatively small, 
it was calculated on the single monomer, and is expected to be multiplied in the trimer. The observation 
whereby mildly destabilizing variants are favored by selection may seem counterintuitive. Nonetheless, 
we show that a similar level of destabilization is observed for a mutation in MERS-Cov HRI (T1015N) 
that increases infection efficiency, at least in vitro’'. Mutagenesis of HR1 in retroviral type I fusion pro- 
teins has indicated that, whereas strong destabilization of the 6HB (as measured by circular dichroism) 
almost inevitably results in reduced infectivity, a minor stability decrease is not necessarily associated 
with defects in cell fusion and infection efficiency*®’. For instance, different aminoacid replacements 
at the same HRI position in HIV-1 gp41 result in decreased stability, but unaffected or even increased 
infectivity*’. In the case of EIAV (equine infectious anemia virus), destabilized HR1 mutants were found 
to display different infection phenotypes depending on temperature*. This observation may be extremely 
interesting in the context of MERS-CoV, as both bats and dromedary camels display adaptive hetero- 
thermy (i.e. sensible daily or season variation in body temperature). 

Coronavirus spike proteins are highly exposed on the virus surface and represent major targets for 
antibody response*’, raising the possibility that adaptive evolution of the S protein is driven by the host 
immune system. This hypothesis is difficult to address due to the paucity of information concerning 
the specific MERS-CoV epitopes recognized by human antibodies. Recently, different studies identified 
human antibodies againts MERS-CoV from non-immune human antibody libraries: all of them were 
directed against the RBD, suggesting that this region represents a major target for the host immune sys- 
tem”. Nevertheless, data on the humoral immune response to MERS-CoV in infected subjects are pres- 
ently lacking, whereas such information is richer for SARS-CoV. Indeed, analysis of antibodies derived 
from a patient who recovered from SARS indicated that some of them recognize epitopes in the HR2 
region”. Their neutralizing effect was ascribed to interference of the interaction between HR2 and HRI. 
Whether antibodies against the S2 region also arise in human subjects (or other mammalian hosts) 
infected with MERS-CoV and related betaCoVs remains to be determined; if this were the case, some of 
the selected sites we identified may be under selective pressure to evade recognition. 

Finally, it is worth noting that HRs have been studied in different viruses because synthetic pep- 
tides interfering with 6HB formation are promising antiviral molecules'*-'®. This is also the case for 
MERS-CoV, and HR2-like peptides were recently shown to be effective in vitro'*. These peptides were 
tested against a MERS-CoV strain carrying Q1020 and all include the interacting M1266 residue'*. These 
antivirals may display decreased activity depending on the MERS-CoV strain and its aminoacid status 
at the selected 1020 position. 


Materials and Methods 

Sequences and alignments. Virus sequences were retrieved from the NCBI database and a list 
of accession numbers is provided as Supplementary Table S1. Sequences of Ty-BatCoV HKU4 and 
Pi-BatCoV HKUS isolated in Hong Kong were derived from a previous work”®. 

Errors in the inferred multiple sequence alignment (MSA), which may be common when highly 
divergent sequences are analyzed, can inflate estimates of positive selection. We therefore used PRANK*® 
for building the MSA and GUIDANCE® for filtering unreliably aligned codons (i.e. we masked codons 
with a score <0.90), as suggested”. 


Detection of recombination and positive selection. To detect positive selection at the S gene 
of clade c betaCoVs we applied the branch-site test from the PAML suite”. The test compares a model 
(MA) that allows positive selection on one or more lineages (foreground lineages) with a model (MA1) 
that does not allow such positive selection. Twice the difference of likelihood for the two models (AlnL) 
is then compared to a x? distribution with one degree of freedom”. Specifically, the internal branches 
of previously reconstructed® Bayesian phylogenies of the S1 and S2 regions were set as the foreground 
lineages in independent tests. A false discovery rate (FDR) correction was applied to account for multiple 
hypothesis testing (i.e. we corrected for the number of tested branches), as suggested”. 

Positively selected sites were identified through the BEB analysis (with a p value cutoff of 0.90), which 
calculates the posterior probability that each site belongs to the site class of positive selection on the 
foreground branch(es). Sites were validated using MEME (with the default cutoff of 0.1), which allows 
the distribution of dN/dS (also referred to as w) to vary from site to site and from branch to branch at 
a site, therefore allowing the detection of episodic positive selection”. 

The site models implemented in PAML were applied -independently- for the analysis of HKU4 and 
MERS-CoV sequences, which display very limited divergence and do not suffer from saturation problems. 
To detect selection, site models that allow (M2a, M8) or disallow (Mla, M7) a class of sites to evolve with 
w > 1 were fitted to the data**. Trees were generated by maximum-likelihood using the PhyML program” 
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with a GTR model of nucleotide substitution and Ņ distributed rates. Positively selected sites were iden- 
tified using the BEB analysis (from model M8)°’. Again, sites were validated using MEME. 

To assure consistency, all models were run using the F3 x 4 and the F61 codon frequency models. 

MSAs were screened for the presence of recombination using GARD. Recombination breakpoints 
were considered significant if the HK (Kishino-Hasegawa) p value was <0.01. 

Simultaneous inference of selection and recombination for analysis of positive selection was per- 
formed using omegaMap”’, a program for detecting natural selection and recombination based on a 
model of population genetics and molecular evolution. The model uses a population genetic approxima- 
tion to the coalescent with recombination. This latter is estimated from patterns of linkage disequilib- 
rium assuming that recombination events occur only between codons and not within them. OmegaMap 
applies reversible-jump Markov Chain Monte Carlo (MCMC) to perform Bayesian inferences of both w 
and the recombination parameter p, allowing both parameters to vary along the sequence. An average 
block length of 10 and 30 codons was used to estimate w and p, respectively. To determine the influ- 
ence of the choice of the priors on the posteriors, analyses were repeated with alternative sets of priors 
(Supplementary Table S2). Three independent omegaMap runs, each with 500,000 iterations and a 50,000 
burn-in iteration, were compared to assess convergence and merged to obtain the posterior probability 
estimate. 

The REL (random effects likelihood) analysis models variation in both dN and dS across sites accord- 
ing to a predefined distribution with different rate classes; positively selected sites are identified through 
an empirical Bayes method’. The default criterion of a Bayes Factor >50 was used to identify positively 
selected sites. 

For GARD, MEME, and REL the nucleotide substitution models were chosen using a Genetic Algorithm 
implemented in the dataMonkey suite”. All analyses were performed through the DataMonkey server” 
(http://www.datamonkey.org). 


In silico analysis of HRa variants. ‘The crystal structure of MERS-CoV HRI and HR2 region was 
obtained from PDB (PDB ID: 4MOD). 

Histidine or arginine residues were introduced at positions 1020 and suitable rotamers were sampled 
through the rapid torsion scan utility in Maestro (Maestro. 9.1; Schrodinger). Intraprotein interactions 
were calculated with PIC (protein interaction calculator)”. 

Because programs that calculate stability changes achieve only moderate accuracy’, we used three 
different methods to assure reliability. These approaches are based on different principles. Specifically, 
PoPMuSiC uses statistical potentials and takes into account amino acid volume variation upon muta- 
tion”; FoldX uses an empirical force field and evaluates the energetic effect of point mutations and the 
interactions contributing to the stability of proteins”. Finally, I-Mutant 2.0 is based on a neural network 
approach to evaluate the free energy change after a single point mutation with incorporation of infor- 
mation on the three-dimensional structure of the protein*’. 

In FoldX and I-Mutant the AAG values are calculated as follows: AAG = AG mutant — AGwitd-type In 
FoldX and I-Mutant AAG values > 0kcal/mol indicate mutations that decrease protein stability, whereas 
in PoPMuSiC AAG values > 0kcal/mol are mark of mutations increasing protein stability. Therefore, 
PoPMuSiC AAG values were multiplied by —1 to obtain homogeneous results. 

In the analysis carried out with FoldX 3D, the three-dimensional structure of the protein was repaired 
using the <RepairPDB> command. Mutations were introduced using the <BuildModel> command 
with <numberOfRuns> set to 5 and <VdWdesign> set to 0. Temperature (298K), ionic strength 
(0.05 M) and pH (7) were set to default values and the force-field was used to predict the water mole- 
cules on the protein surface. 
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